Skip to content

Conversation

@zachschuermann
Copy link
Member

@zachschuermann zachschuermann commented Oct 16, 2025

What changes are proposed in this pull request?

Adds a new required method: copy_atomic(&self, src: &Url, dest: &Url) -> DeltaResult<()> to StorageHandler. This PR also adds support for the default engine via the (dumb) way of GET/PUT. Note that I've elected to pursue the simple/correct thing here and we can attempt to optimize in the future (and can open a follow-up if others agree).

This implementation proposes a slight departure from existing Engine APIs: instead of returning a DeltaResult<()> we return Result<(), CopyError> with CopyError defined as:

old pieces on CopyError omitted
#[derive(thiserror::Error, Debug)]
pub enum CopyError {
    #[error("Destination file already exists: {0}")]
    DestinationAlreadyExists(String),
    #[error(transparent)]
    Other(#[from] Box<dyn std::error::Error + Send + Sync>),
}

It captures the only things we care about from the copy API perspective: either the destination already exists and we can return a nice error message to the user saying their commit has already been published (considering publishing is the main use case of this API for now) or we just got back some other random error which we don't really care what it is, but rather just something we can surface to the user and fail the overall publish API.

I've used this PR as an opportunity to introduce an Engine API more aligned with our pursuit of finer-grainer errors (especially for Engine trait) but happy to split out if we think it's better to just retain existing DeltaResult pattern.

Motivation

This PR will be used for commit publishing - basically copying commits from staged commits to published commits. See #1377 for some context on future usage.

This PR affects the following public APIs

New required method in StorageHandler trait: copy_atomic

How was this change tested?

new UT for default engine impl

@codecov
Copy link

codecov bot commented Oct 16, 2025

Codecov Report

❌ Patch coverage is 78.04878% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 84.79%. Comparing base (09a82a3) to head (0c5f48f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
kernel/src/engine/default/filesystem.rs 86.48% 1 Missing and 4 partials ⚠️
kernel/src/engine/sync/storage.rs 0.00% 2 Missing ⚠️
kernel/src/listed_log_files.rs 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1400      +/-   ##
==========================================
- Coverage   84.80%   84.79%   -0.01%     
==========================================
  Files         118      118              
  Lines       30325    30366      +41     
  Branches    30325    30366      +41     
==========================================
+ Hits        25716    25748      +32     
- Misses       3382     3387       +5     
- Partials     1227     1231       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zachschuermann zachschuermann changed the title feat(catalog-managed): New StorageHandler required method: copy feat!(catalog-managed): New StorageHandler required method: copy Oct 20, 2025
@nicklan nicklan removed the request for review from scovich October 21, 2025 18:34
@zachschuermann zachschuermann changed the title feat!(catalog-managed): New StorageHandler required method: copy feat!(catalog-managed): New copy_atomic StorageHandler method Oct 21, 2025
// Read source file then write atomically with PutMode::Create
// This ensures: 1) atomicity 2) fails if destination exists
self.task_executor.block_on(async move {
let data = store.get(&src_path).await?.bytes().await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there concerns about the size of the data we're loading into memory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but i'm advocating not to optimize here. i'll open a follow up to track this tho

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nicklan nicklan removed the request for review from OussamaSaoudi October 22, 2025 20:54
@zachschuermann zachschuermann merged commit 733a117 into delta-io:main Oct 22, 2025
20 of 22 checks passed
@zachschuermann zachschuermann deleted the engine-copy branch October 22, 2025 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants